AIbase
Home
AI Tools
AI Models
MCP
AI NEWS
EN
Model Selection
Tags
High-precision Reward Model

# High-precision Reward Model

Skywork Reward Llama 3.1 8B V0.2
An advanced reward model built on the Llama-3.1-8B-Instruct architecture, trained with 80K high-quality preference pairs, excelling in handling preference issues in complex scenarios.
Large Language Model Transformers
S
Skywork
25.99k
35
Ppo LunarLander V2
This is a reinforcement learning model based on the PPO algorithm, specifically designed to solve the landing task in the LunarLander-v2 environment.
Physics Model
P
araffin
65
18
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
English简体中文繁體中文にほんご
© 2025AIbase